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Abstract 

In a recent paper [4], Efron pointed out that an important issue in large-scale multiple 
hypothesis testing is that the null distribution may be unknown and need to be estimated. 
Consider a Gaussian mixture model, where the null distribution is known to be normal but 
both null parameters — the mean and the variance — are unknown. We address the problem 
with a method based on Fourier transformation. The Fourier approach was first studied 
by Jin and Cai [9], which focuses on the scenario where any non-null effect has either the 
same or a larger variance than that of the null effects. In this paper, we review the main 
ideas in [9], and propose a generalized Fourier approach to tackle the problem under another 
scenario: any non-null effect has a larger mean than that of the null effects, but no constraint 
is imposed on the variance. This approach and that in [9j complement with each other: each 
approach is successful in a wide class of situations where the other fails. Also, we extend the 
Fourier approach to estimate the proportion of non-null effects. The proposed procedures 
perform well both in theory and on simulated data. 

Keywords: empirical null, Fourier transformation, generalized Fourier transformation, pro- 
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1 Introduction 



Large-scale multiple testing is a recent area of active research in statistics, where one tests 
thousands or even millions of null hypotheses simultaneously: 

Hj, j = l,...,n. 

Associated with each null hypothesis is a test statistics Xj, which, depending on the situation, 
can be a summary statistic, a p- value, a regression coefficient, or a transform coefficient, etc.. 
We say that Xj contains a null effect if Hj is true, and contains a non-null effect if otherwise. 

A convenient model is the Bayesian hierarchical model [H [5] which we now describe. Fix 
< e < 1. For each 1 < j < n, we flip a coin with probability e of landing tail. If the coin 
lands head, we draw Xj from a common density function fo(x) which we call the null density. 
If the coin lands tail, we draw Xj from an individual density function £j(x), where £j itself is 
randomly generated according to a fixed probability measure S. In effect, Xj can be viewed as 
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samples from the density f\{x) = J £(a;)cG(£), which we call the alternative density; see JBj|5]. 
Marginally, Xj can be deemed as samples from the following two-component mixing density: 

Xj ~ (l-e)f (x)+ef 1 (x) = f(x). (1.1) 

The parameter e is closely related to the proportion of non-null effects (i.e., the fraction of 
null hypotheses that are untrue). In fact, under the Gaussian mixture model, the number of 
untrue hypothesis is distributed as Binomial with parameters n and e. So when n is large, the 
difference between e and the actual fraction < O p (y/e/n) and is usually negligible. For this 
reason, we call e the proportion of the non-null effects in this paper. 

The null density is the starting point for any testing procedures. In many scenarios, the 
null density is assumed as known. However, somewhat surprisingly, this assumption may be 
incorrect in some multiple testing situations as pointed out by Efron [4]. Efron illustrated his 
point with a breast cancer microarray data, which is based on 15 patients with 7 having BRCA1 
mutation and 8 having BRCA2 mutation. For each patient, the same set of 3226 genes were 
measured and it is of interest to find which genes are differentially expressed. For each gene, a 
studentized-i score was calculated and then transformed to a z-score (see [4]) for the details). 
Efron argued that, although the theoretical null should be the standard normal iV(0, 1), another 
null density, 7V(0.02, 2.50) seems to be more appropriate. Efron called the later the empirical 
null and demonstrated convincingly that it is better to use the empirical null instead of the 
theoretical null in many situations. 

There are many possible reasons why the empirical null may be different from the theoretical 
null. Take the breast cancer microarray data for example, the studentized-i statistics may not 
be truly i-distributcd due to failed distributional assumptions. There may be covariates (such 
as age of the patients) that has not been observed in the data. The correlation across different 
genes (also that across different arrays) has been neglected. All these factors may drive the 
empirical null far from the theoretical null. 

Unfortunately, unlike the theoretical null, the empirical null is usually unknown. Thus how 
to estimate the empirical null is a problem of major interest. 



1.1 Identifiability issue and constrained Gaussian mixture models 

Note that in Model (fTTTjl . some may call /o the null density, and some may call /i the null 
density. To resolve this issue, we fix a constant eo G (0, 1/2) and assume 

< e < eo, 

so that the null density is tied to the majority of the hypotheses. 

We adopt the Gaussian model as suggested in Efron [4]. In detail, let </>(■) be the density 
of iV(0, 1). We assume that the null density /o is Gaussian with an unknown mean uq and an 
unknown variance Cq: 

fo{x) = —4>{ ). 

At the same time, we assume that the alternative density f\ is a Gaussian mixture (both a 
location mixture and a scale mixture) with a bivariate mixing distribution H(u,a): 

/X. — 11 
~<f>(- )dH(u,a). 
a a 

The marginal density of Xj is then 

X — Un f 1 x — u 
f(x) = f(x; u ,<t , e, H ) = (1 - e)— <f>( °-) + e / -«/>( )dH{u, a). (1.2) 

With the Gaussian model, the problem of estimating the null density reduces into the problem 
of estimating the null parameters (uq, ctq). 

However, the null density in the above Gaussian model is not always identifiable. This is 
because, without constraint on H(-, •), /i can be very close or even identical to fo. Fortunately, 
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there are many natural constraints that we can put on H(-, •) to resolve this problem. Below are 
some examples. 

Definition 1.1 Fix €q £ (0,1/2), uq, and ctq > 0. We say that f(x) = f(x;uQ,<jQ,e,H) 
is a Gaussian mixture density constrained with Elevated Variances with respect to parameters 
(ito, Co, eo) if it has the form as in il.2\) . and that the proportion e and the mixing distribution 
H satisfy 

0<e<e , P H (o->a ) = l, P H {(u, a) ^ (u , a )) = 1. (1.3) 
We refer to the Gaussian model il.ty) with constraints in 111. Sty as GEV(u Q ,o~o,eo). 

For short, we write GEV(u , o~q, Cq) as GEV whenever there is no confusion. In the definition 
above, u and a denote the location and scale parameters from the mixing distribution, and Pa- 
denotes the probability under the mixing distribution H(-, •). This models a situation where the 
variance associated with an individual non-null effect is no less than that of a null effect. The 
following lemma shows that, given (| 1 .3() . the triplets (uq, <7o, e) are uniquely determined by f{x) 
and the idcntifiability issue is therefore resolved. 

Lemma 1.1 Given a density f{x) = f(x; Uq, o~q, e, H ) satisfying H1.2\) and 111.3]) . the parameters 
uq, fTo, and e are uniquely determined by f(x). 

This lemma is proved in Section [5J Note that if we replace the constraint Ph(°~ > °o) = 1 by 
P(o~ < o-q) = 1, then the identifiability issue persists (the construction of counter examples is 
elementary and we skip it). 

Alternatively, we define GEM as follows. 
Definition 1.2 Fix eg £ (0,1/2), uq, and o-q > 0. We say that f(x) = f(x;e,uo,ao,H) is a 
Gaussian mixture density constrained with Elevated Means with respect to parameters (uo, o~o, 6q) 
if it has the form as in if -?.#)) . and that the proportion e and the mixing distribution H satisfy 

< e < eo, P H (u > uo) = 1. (1-4) 

We refer to the Gaussian model hi. 2(1 with constraints in \l-4ty as GEM(u , cr , e ) . 

GEM models a situation where the mean associated with an individual non-null effect is larger 
than that of a null effect. The following lemma is proved in Sectional 

Lemma 1.2 Given a density f{x) = f(x; w , o'Oi e i H) satisfying kl.ty) and \l-4ty , the parameters 
Uq, o-q, and e are uniquely determined by f{x). 

For the case where we replace the constraint Ph(u > uo) = 1 in (|1.4p with Ph{u < uq) = 1, the 
discussion is similar. Also, we can relax the constraint to Ph{u > uq) = 1. But by doing so we 
need some conditions on a. For reasons of space, we skip the discussion along these two lines. 

GEV and GEM are the two main models we study in this paper. Despite the additional 
constraints, both models are broad enough to accommodate many interesting cases that arise in 
real applications. In sections below, we discuss possible approaches to consistently estimating 
the null parameters in GEV and GEM. 

1.2 A Fourier approach to estimating the null parameters in GEV 

Conventionally one estimates the null parameters with either empirical moments or extreme 
observations. However, in these quantities, the information containing the null parameters is 
highly distorted by the non-null effects. A non-orthodox approach is therefore necessary. In a 
recent work [9], Jin and Cai proposed a Fourier approach to estimating the null parameters in 
GEV. We now briefly explain the idea. 

When it comes to a density function, one usually pictures it as a smooth curve that spreads 
over the real line. Joseph Fourier taught us a different view point: a normal density N(u,a 2 ) 
is not only a bell shaped curve centered at u, but also a wave oscillate at the frequency u. In 
fact, the Fourier transform of the density N(u, a 2 ) can be decomposed into two components: the 
amplitude function determined by cr 2 , and the phase function determined by u: 

e -^ 2 * 2 /2 . e itu = Amplitude • Phase function, i = (1.5) 
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Consequently, we can view a Gaussian mixture as a superposition of waves with different fre- 
quencies and different amplitudes. 

We now invoke GEV. The above investigation gives rise to an interesting approach to esti- 
mating the null parameters. Denote the empirical characteristic function by 

1 - 

Mt) = MM x 1 ,...,x n ) = -J2 ettx > ■ 

i=i 

For an appropriately large frequency t, the stochastic fluctuation is negligible and tp n reduces 
to its non-stochastic counterpart — the underlying characteristic function ip(t) = E[tp n (t)]. By 
direct calculations, 

t/)(t) = V>(*; «o, (j , e, H) = V'o (t)[l + s{t)\, 

where 

Mt) = MM «o, v ,e) = (1 - e ) e »«*-o* 2 /2 i 

and 

s(t) = s(t; u , a ,e, H) = ^— J e^-^-^-°> 2 ' 2 dH{u, a). (1.6) 

Now, with GEV and a little bit extra condition, s{t) sa 0. For example, if we assume that 
Ph{o > Co) = 1, then at a high frequency i, 

\s(t)\ < J e~ {a2 - a « )i2/2 dH{u, a) » 0. (1.7) 

This says that in GEV, as the frequency t tends to oo, the waves corresponding to the alternative 
density damps faster than that associated with the null density. Therefore, the information 
containing the null parameters is asymptotically preserved in high frequency Fourier transform, 
where the distortion of non-null effects is negligible. In other words, for an appropriately large 
frequency t, 

Mt) » M) « Mt)- 

Now, since ipo(t) has a very simple form, we can solve (ito, o~o) (and also e) from it. 

The elaboration of the idea gives rise to the estimators in [9] , which are proved to be uniformly 
consistent to the null parameters across a wide range of mixing distributions H(-, ■). It was also 
shown in [3] that these estimators attain the optimal rate of convergence. See the details therein. 
These works reveal that, somewhat surprisingly, the right place to estimate the null parameters 
is in the frequency domain, rather than in the spatial domain as one may have expected. 

1.3 A generalized- Fourier approach to estimating the null parameters 
in GEM 

Despite its encouraging performance in GEV, the above approach does not yield a satisfactory 
estimation in GEM. To see the point, we note that the key for the success of the above approach 
is (|1.7p . which critically depends on the assumption of Pff(cr > cto) = 1- Note that such an 
assumption does not hold in GEM. As a result, the above approach ceases to perform well. 

Fortunately, there is an easy fix. The key is to replace the Fourier transformation by the 
generalized Fourier transformation (to be introduced below), so that in the frequency domain, 
the roles of the mean and the variance are sort of "swapped" . In detail, let 

uj = -(1 + i)/V2, ( note oj 2 = i). 

For any density function h(x), the generalized Fourier transformation is 

J h{x) exp(uix)dx 7 



4 



provided that the function ft,(x)exp(o;a;) is absolutely integrable. In particular, the generalized 
Fourier transform of the Gaussian density N(u, a 2 ) is 

ut Ut (7 2 t 2 

exp( -=) ■ exp(i[ p H — ]) = Amplitude function • Phase function. (1-8) 

v 2 v 2 2 

Now, the amplitude is uniquely determined by the mean (compare with (11.51) ) . 

The remaining part of the idea is similar to that in the preceding section. Denote the 
generalized- empirical characteristic junction by 



1 ™ 

(p n (t) = tp n (t;Xi, . . .,X n ) = - y^cxp^Xj) 



n 

i=i 

For large n and an appropriately chosen t, one expects that the stochastic fluctuation is negligible, 
and that ip n (t) reduces approximately to the generalized characteristic function, 

tp(t) = (p(t;u ,cr ,e,H) = E[ip n (t)]. 

Direct calculations show that 

tp(t) =<p (t)[l + r{t)], 

where 

(p (t) = ip (t; u , a , e) = (1 - e) cxp(cju i + ! /2), 

and 

r(t) = r(t; u , cr , e, ff) = / cxp(w(w - u )t + i(a 2 - a$)t 2 /2)dH(u, a). (1.9) 

Recalling that u> = — (1 + i)/\2, it is seen that 

u, a). 

We now invoke GEM. Similarly, since that Ph(u > uq) = 1, r(t) as for large t. We expect 
that 

P„(i) « W <y9 (t)- 

Again, (^o(i) has a very simple form and we can solve (uo,o~o) (and also e) from it. In fact, 
introduce two functionals Uo{-]t) and <7q(-;£) by 

/ ,n V2 d 2 \/2R-e(wffp') 

\g{t)\ dt t\g{t)r 

where g is any complex-valued differentiable function, and \z\, Re(z) and z denote the module, 
the real part, and the complex conjugate of a complex number z, correspondingly. The following 
lemma says that plugging g = ipo into two functionals gives the desired parameters uq and a 2 , 
respectively. 

Lemma 1.3 For all t ^ 0, wo(<po! i) = uo and a 2 (ip; t) = ctq . 

Lemma 11.31 can be proved using elementary algebra, so we skip it. Taking g = tp n in (jl.lOp . we 
expect to have 

u ((p n ,t) as u (ip,t) as uo, a 2 (ip n ,t) as cr 2 ((f,t) as a 2 . 

In this paper, we shall carefully study the bias and variance of uo((p n ;t) and cr 2 ((p n ;t), and 
investigate which choices of t give a good tradeoff between the bias and the variance. We find 
out that as n tends to oo, if we set t in an appropriate range, then both estimators are consistent 
with their cstimands, uniformly so across a wide class of situations. 
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1.4 Estimating the proportion of non-null effects 

Seemingly, the approach can be readily generalized to estimate the proportion of non-null effects 
e. How to estimate the proportion has been the topic of many recent works in the area of 
large-scale multiple hypothesis testing. See for example [3J [5] O [5J [S] [TH] HH HH Q3] . There are 
two reasons for the enthusiasm. In some applications, the proportion is the quantity that is of 
direct interest [11] ; while more often, knowing the proportion helps to improve many multiple 
testing procedures, such as the FDR procedure by Benjamini and Hochberg's [2], the local FDR 
procedure by Efron et al. [3] and the optimal discovery function by Storey [13] . See [8] for more 
discussions. 

In Section [3] we extend the generalized Fourier approach to estimating the proportion in 
GEM. We discuss two different cases: (1) the null parameters are known; and (2) the null 
parameters are unknown. In both cases, we find that the estimators are uniformly consistent 
with the proportion across a wide class of situations. 

We remark that the success of the Fourier approach for estimating the null parameters and 
the proportion is not coincidental. It roots from the key fact that the null density can be 
isolated from the alternative density in the high frequency Fourier coefficients. Naturally, we 
shall continue to find the Fourier approach to be successful in estimating many other quantities. 

The remaining part of the paper is organized as follows. Section [2] studies the problem 
of estimating the null parameters in GEM. We show that by choosing an appropriate i, the 
estimators ito(<p n ; t) and erg t) are consistent to the true parameters, uniformly across a wide 
class of situations. Section [3] studies the problem of estimating the proportion. While the studies 
in Sections[2]l3jare asymptotic, we carry out a few simulation studies in Section|4l and investigate 
the performance of the proposed estimators for moderately large n. Section[5]contains the proofs 
for the theorems and lemmas, in the order they appear. 

2 Main results 

In this section, we limit our attention to GEM and study the estimation errors of uo((p n ',t) and 
cjg(<y9 n ;i). Since the discussions are similar, we focus on that of uo(<p n ',t). For the asymptotic 
analysis, we adopt a framework where both e and H may depend on n as n ranges from 1 to oo 
(denoted by e„ and H n ). This covers a much broader situations than that when (e, H) are fixed 
as n ranges from 1 to oo. 

2.1 Asymptotic framework 

Recall that the test statistics Xj are iid samples from 

1 X — Uq /I — ^ 

f(x) = f(x;u ,a , en, H n ,ri) = (1 - e„) — <j>( ) + e„ / -<f>( )dH n (u,a). (2.11) 

oo cr J a a 

As before, fix e £ (0, 1/2). We suppose that for any n > 1, 

< e„ < e . (2.12) 

Of course, the condition can be relaxed so that it only holds for sufficiently large n. 
Also, fixing A > 0, assume that 

u > -A, a\ < A. (2.13) 

In addition, we assume that for any n > 1, 

P Hn {u>u ) = l, P Hn {v 2 <A) = l. (2.14) 

These conditions are relatively relaxed, except for the second one in (]2.14[) . We need this 
condition to control the variance of the estimators (whether this condition can be significantly 
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relaxed is an open question, which we leave to the future study). In short, we focus the study 
on the class of marginal densities as follows, 

A n (e , A) = {/(x) = f(x; u , a , e n , H n , n) has the form as in ([2~TT]) that satisfies (|2?T2|) - ((2?T4|) }. 
For any t > 0, it follows from the triangle inequality that 

K(¥>n, t) -U \ < \u ((f n ,t) - U ((p, t)\ + \u ((f, t) - U \. 

On the right hand side, the first term is the stochastic term, and the second term is the bias 
term. Seemingly, the performance of the estimator depends on the choice of t. Larger t tends to 
give a larger stochastic fluctuation but a smaller bias. It turns out that the interesting range of 
t is 0(^/\ogn). In light of this, wc calibrate t through a parameter 7 by 

t = *n (7) = \/7logn, 7 > 0. 
We now study the stochastic term and the bias term separately. 



2.2 The stochastic term 

We need the following definition. 

Definition 2.1 Fixing a constant r, we say that a sequence {fonjn^i «s o(n~ r ) if n r ~ s \b n \ — > 
as n — > 00, for all S > 0. Especially, when r = 0, we write o(l). 

First, we study the stochastic fluctuation of ip n (t) and <p' n (t). The following lemmas are 
proved in Section 

Lemma 2.1 Fix e 6 (0, 1/2), A > 0, and 7 G (0, l/A). As n tends to 00, 

sup {Var(^ n (*n(7)))} < n A7_1 , (2-15) 

{/eA n (6 ,A)} 

and 

sup {Var(v4(fn(7)))} < 4A 2 7 log(n) • n A ^\ (2.16) 

{/SA„(eo,A)} 

The upper bounds in (|2 . 1 5|) - (|2 . may be conservative, especially when e„ is small. See the 
proof for the details (we say two positive sequences a n < b„ if a n /b n < 1 + o(l) for sufficiently 
large n). 

We now relate the stochastic fluctuations of uo(ip n ;t) and u^{ip n ;t) to that of tp n (t) and 
<p' n (t). This is achieved by the following lemma. 

Lemma 2.2 Let Uq(-\ •) and (?$(■;■) be defined as in \1.10)) . Fix t > 0. For any differ entiable 
complex-valued functions f and g satisfying \ f(t)\ ^ and \g(t)\ =/= 0, 

|^o( g ,i)- Wo (/,t)| < ^^[(| Uo ( 5 ,t)|.(| /( t) + | 5 (i)|) + V2|^(0|)|/(i)-.g(i)|+V2|/(i)|-|(/(t)-.9(t))'l, 
and 

|a 2 ( ff) *)-^(/,t)|<^^.^^ 



Apply Lemma |2. 21 with f = tp n , g = tp. Intuitively, 

<p n (t) « V (t), v/„(t) « 

Also, when t = t n (j), 

<Pn(tn(nr)) = 0(1). 

We therefore expect to have 

M^,*n(7))-**(P;tn(7)^ 

As a result, we have the following lemma. 
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Lemma 2.3 Fixeo E (0,1/2), A > 0, and 7 E (OA/ A). Asn tends to oo, except for a probability 
that tends to 0, 

sup {\u (cp n ; t n (j)) - uo(w*n(7))|} < o^- 1 ^ 2 ), 
{/eA„( eo ,A)} 

and 

sup {\o- 2 (Vn;t n ( 7 ))-o- 2 (<p;t n ( 7 ))\} <o(n^-V/ 2 ). 

{/eA„(e ,A)} 

In conclusion, except for a probability that tends to 0, the stochastic fluctuation of cither esti- 
mator is of the order of n^ T_1 " 2 . Note that the exponent (Aj — l)/2 < 0. 

2.3 The bias term 

We now discuss the bias term. The following lemma is proved in Section [5] 

Lemma 2.4 Fix eq g (1/2) and A > 0. Let r(t) = r(t;uo,o~o,e n ,H n ) be as in \1.9\) . For any 

t > and f E A„(eo,^4), iftere exists a universal constant C > smc/i i/iai 

luo^^-uol^CKWI, 
ko 2 (v;*)-^l < C7|r'(t)|/t. 

Write for short t„ = t n (-f). Under mild conditions, r'(t n ) — > 0. We now show some examples 
where this is the case. 

Example 1. The non-null effects are sparse. In this case, we suppose that the parameter e n 
tends to as n tends to oo at a rate faster than that of l/t n . By the proof of Lemma \2. 31 

\r'(t n )\<-^[^ + At n }. 
i e n ei n 

So as long as e n t n — > 0, |r'(t n )| — > 0, regardless of the distribution of H n (-,-) (of course, the 
condition of Pn n (u > uo) = 1 is still needed). 

Example 2. Elevated means. In this case, we suppose that the mean corresponding to the 
null density is elevated by at least a small amount 5 n > 0: 

Pn n ( u > u o + $n) = 1- 

Recall that uq > —A and that for any H n E A n (eo,A), Ph„(|ct 2 — a 2 ] < A) = 1. Similar to the 
proof of Lemma 12.31 

\r'(t n )\ < -^(5 n + ^ + At n )e- S ^'^. 
As a result, wc have the following lemma, whose proof is elementary so we omit it. 
Lemma 2.5 If there is some constant Co > such that 

hminf{ }>(co + l), (2.17) 

then \r'(t n )\ < Ae n t~ co . 

As a result, as n — » oo, the bias — »• if (|2.17[) holds for some constant Co > 0, whether e n tends 
to or not. 

Example 3. When the bivariate random variables (u,o~ 2 ) have a smooth joint density. We 
re-center u and a 2 by letting 6 = u — uq and n — (a 2 — a 2 )/2. Denote the joint density of 
(S, «) by h n (-,-). We show that the r'(t n ) = o(l) under mild smoothness conditions on h n (-,-). 
In detail, for each fixed S > 0, let h^ T (-\8) be the Fourier transform of the conditional density 
h n (n\5). Fix a > 0. Suppose that there is a generic constant C > such that for all 6 in the 
range, 

K T (tu\S)\ < C(l + \t n \)-", \± h F T (t n \S)\ < C(l + |i„|)-( Q+1 >. (2.18) 
We have the following lemma, whose proof is elementary so we omit it. 
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Lemma 2.6 Suppose A2.18\) holds for some constant C > and a > 0. Then there is a generic 
constant C > such that 

\r'(t n )\<Ce n \t n \- 2 ^ +1 \ 

Note that r'(t n ) — > in a much broader setting than that in this example. 

Combining Lemma l2.3H2.4l and the above examples, we have the following theorem. 

Theorem 2.1 Fix eo £ (0, 1/2), A > 0, and 7 £ (0, 1/A). Suppose that when n tends to 00, at 
least one of the three conditions below holds: 

a. lim„_ >00 (e„ • t n (j)) = 0, 

b. Ph„(u > uq + 8 n ) = 1, where S n satisfies \2.17\ ) for some constant cq > 0. 

c. &2.18]) holds for some parameter a > 0. 

Then the estimators uo(tp n ; i n (7)) twid 0o(^«' ^"(t)) are consistent with respect to the null pa- 
rameters uq and <7p, respectively, uniformly across all densities in A„(eo,^4) that satisfy one or 
more of the conditions (a), (b), and (c). 

We remark that while choosing 7 £ (0, 1/A) ensures consistency, different choices of 7 affect the 
convergence rate of the estimators. The optimal choice of 7 depends on unknown parameters 
and is hard to set. In Section 31 we investigate how to choose 7 with simulated data. In our 
experience, when A is not very large, it is usually appropriate to choose 7 0.2. 

We also remark that in Theorem l2.ll (as well as Theorems 13.11 15721 below), we have assumed 
independence of the test statistics Xj. When the test statistics Xj are correlated, the bias of 
the estimators remain the same, but the variance of the estimators may inflate by a factor. On 
the other hand, if the correlation is relatively weak, the estimators continue to perform well. In 
Section 21 we investigate an simulation example with block- wise dependence among Xj . The 
simulation results suggest that the estimators continue to perform well when the block size is 
small (e.g. < 100). See Section 2] and Figure [3] for the details. 

3 Estimating the proportion of non-null effects 

The proportion has an identifiability issue that is very similar to that of the null parameters. The 
issue can also be resolved similarly in GEV and GEM. In [21 [51 [3], we have carefully investigated 
the problem of estimating the proportion in GEV. Similarly to estimating the null parameters, a 
Fourier approach was introduced (e.g. 0[9]). Compared to existing approaches in the literature 
(e.g. [SJ dni HH HH H31 [5] ) , the Fourier approach was proven to be successful in a much broader 
setting. Especially, it was shown to be successful without the so-called purity condition, a notion 
introduced in [BJ. Later in [3J, the approach was shown to also attain the optimal rate of 
convergence over a wide class of situations. 

We now shift our attention to GEM. Despite the encouraging development, the Fourier 
approach in [5J [S] ceases to perform well in this case. In fact, in this case, it can be shown that 
none of the aforementioned approaches is uniformly consistent with the proportion. Therefore, 
it is necessary to develop a new approach. 

In this section, we propose a new approach to estimating the proportion by using the general- 
ized Fourier transformation, as a natural extension of the ideas in preceding sections. We discuss 
two cases separately: the case where the null parameters are known, and the case where the null 
parameters are unknown. In both cases, we show that under mild conditions, the proposed 
approach is uniformly consistent with the proportion. 

3.1 Known null parameters 

Recall that 

ip (t) = (l-e Il )e WU0 * +ttT ° t2/2 . 
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The key observation is that, when the null parameters (uq, ctq) are known, e n can be easily solved 
from ipo(t) by 

e„ = 1 - e-^-^/^oit). 
Inspired by this, we introduce the functional 

e n (g; t, u 0> a 2 ) = 1 - e —ot— 1^/^(1). (3.19) 
where g(t) is any complex- valued function. Recall that 

<Pn(t) « <p(t) « po(*)- 
By the continuity of the functional, we hope that for an appropriately chosen t, 

e n (ip n ;t,Uo,a ) w e„(^ ; «o, cr ) = e„. 

We now analyze the variance and the bias of this estimator. As before, let t n (^f) = VodogrT. 
For any H n (-, •) satisfying Pf/ n (er 2 < A) = 1, direct calculations show that 

n n 
so the standard deviation of the estimator is of the order of o(e n ) when 

n^ 1 = o{el). 

At the same time, by elementary calculus, the bias of the estimator equals to 

\E[e n (ip n ;t n ,u ,a 2 Q )] -e n \=e n -\J e^ u - u ^ +i ^-^' 2 dH n {u,a)\, 

which is of the order of o(e„) if cither of the aforementioned conditions (b) or (c) holds. Com- 
bining these gives the following theorem. 

Theorem 3.1 Fix uo, A > 0, a 1 € (Q,A), 7 € (0,1/ A), and a sequence of positive numbers 
b n satisfying linin—foo b n = 0. Consider a sequence of parameters e„ £ (0, 1) and a sequence of 
bivariate distribution H n (u,o~) such that for sufficiently large n, Pn n {u > uo,a 2 < A) = 1 and 
e~ 2 n" 47_1 < b n - Also, suppose that when n tends to 00, at least one of the two conditions below 
holds: 

b. Ph,S u > u o + S n ) = 1, where S n satisfies (2.17\ ) for some constant cq > 1. 

c. (2.1 Sty holds for some parameter a > 0. 

Then as n — > 00 , except for a probability that tends to zero, 

\ £n{Vn\t n (~i),u ,al) _ i , 

uniformly for all e n and H n (-, ■) satisfying the conditions above. 

In other words, e n (ip n ; ^(7)) is uniformly consistent with e„ provided that either (b) or (c) holds, 
and that the variance of the estimator is of a smaller order than that of e n . The latter is satisfied 
when e n tends to slowly enough. 



3.2 Unknown null parameters 

When the null parameters are unknown, a natural approach is to estimate the null parameters 
using the approach in Section[2]first, then plug in the estimated values to estimate the proportion. 
In other words, we first estimate the null parameters by 

"0(7) = u (<p n ;t n (j)), a 2 (j) = al(ip n ;t n (j)). (3.20) 
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We then estimate the proportion by the plugging estimator, 

enten;*»(7),fio(7).*o(7)) = 1 - e-^W^W-^W^W/a^^^)). 

Note that the bias of both itoil) ar >d 60(7) are typically of the order of o(e„), and their variance 
are of the same order as that of e n (ip n ;j). Therefore, replacing (uq,(Tq) by (£0(7), 6q (7)) d° es 
not increase either the bias or variability of the estimator. The following theorem is proved in 
Section [5] 

Theorem 3.2 Fix uo, A > 0, a 2 € (0,A), 7 € (0, 1/j4), and a sequence of positive numbers 
b n satisfying linin^oo b n — 0. Consider a sequence of parameters e n € (0, 1) and a sequence of 
bivariate distribution H n (u,o~) such that for sufficiently large n, Pn n (u > uo,a 2 < A) = 1 and 
e~ 2 n^ 7_1 < o n . j4iso, suppose that when n tends to 00, at least one of the two conditions below 
holds: 

b. Ph„(u > uq + <5 n ) = 1, where S n satisfies pOTD /or some constant cq > 1. 

c. &2.18]) holds for some parameter a > 0. 

XTien as n — > co, except for a probability that tends to zero, 

I en(yn;*n(7) ; "o(7)>Oo(7)) _ jl Q 

uniformly for all e n and H n {-, •) satisfying the conditions above. 

4 Simulations 

In this section, we conduct simulation studies for investigating the performance of the proposed 
estimators of (ito, ctq, e) with a finite n. Wc write for short 

"0(7) =Uo(cpn;t n (j)), 6-0(7) = o-o(¥'™; < n(7)) 5 e„(7) = e n (^„;f n (7),u (7), 6-0(7)). 

Specifically, we are interested in four aspects: (1) how different choices of 7 affect the estimation 
errors of ^0(7), 60(7) and e n (7); and what 7 values we should recommend in practice; (2) the 
effect of different choices of the proportion e and the mixing distribution H n (-, •); (3) the effect 
of larger n; and (4) the effect of dependent structures. 

Example 1. Different choices of the tuning parameter 7. In this example, we let n = 50, 000, 
(u ,CTo) = (-1) !)i and e = °- 025 x (1,2,3,4,8). We choose 20 different 7 ranging from 0.01 to 
0.5 with equal inter-distances. For each combination of (e,7), we conduct an experiment with 
the following four steps. 

• Step 1. For each 1 < j < n(l — e), draw Xj ~ N(uo, Oq) to represent a null effect. 

• Step 2. For each n(l — e) + 1 < j < n, draw independently a sample u ~ Uniform(l, 2) and 
a sample er ~ Uniform (0.5, 1.5). Then, draw Xj ~ N(u,a 2 ) to represent a non-null effect. 

• 5*tep 3. Calculate #0(7), 60(7), and ^(7). 

• 5<ep ^. Repeat Steps 1-3 for 100 times. 

The results are reported in Figure [TJ from which we can see that the MSE are the smallest when 
7 € (0.15,0.25). Also, the MSE are not sensitive to different choices of 7: they remain about 
the same for different 7 6 (0.15,0.25). All of the three estimators 1x0(7)7 60(7), and e n {l) have 
satisfactory performances: when 7 = 0.2, the MSE are as small as the order of 10~ 4 . Somewhat 
surprisingly, in this example, different e do not have a prominent effect on the MSE. 

Example 2. The effect of different mixing distribution H n (-, •). In this example, we set n = 
50, 000, (u , °0i e) = (-1, 1, 0.05), and choose 20 different 7 ranging from 0.01 to 0.5 with equal 
inter-distances. Compared to Example 1, we conduct experiments with different choices of the 
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Figure 1: MSE for 1(0(7) (left), 00(7) (middle) and e n (l) (right) for 100 repetitions. The x-axis 
displays 7, and the y-axis displays the MSE. The null parameters (mo,ctq) = (—1, 1). Different 
colors of the curves represent different values of e. 



Table 1: MSE for ^0(7), 00(7), and e n {l) for different n, where we take 7 = 0.2. The parameters 
(u ,al,e) = (-1,1,0.05). For the mixing distribution H n (u,cr), u ~ Uniform(l,2) and a ~ 
Uniform(0.5, 1.5) independently. In each cell, the MSE equals the cell value times 10~ 4 . 



n 


10 4 


3 x 10 4 


5 x 10 4 


8 x 10 4 


10 5 


MSE for 1*0(7) 
MSE for <t§(7) 


41.28 


10.46 


5.66 


4.01 


2.73 


16.47 


6.93 


2.36 


1.81 


1.48 


MSE for e„( 7 ) 


20.13 


5.28 


4.17 


2.87 


2.01 



mixing distribution H n (-,-). We consider two scenarios. In the first scenario, independently, 
(u — uq) Gamma(10, 0.25) (Gamma(fc,#) is the Gamma distribution with shape parameter k 
and scale parameter 6), and a ~ Uniform(0.5, 1.5). The parameters (10,0.25) are chosen such 
that the mean value of the random variable u is 1.5, the same as that in the preceding example. 
In the second scenario, independently, u ~ Uniform(l, 2) and a ~ Gamma(10, 0.1). 

For each scenario and each 7, we run experiments following Steps 1-4 as in Example 1, but 
with the current choice of H n (-, •). The MSE for 60(7), Oq(t) and ^n(l) are reported in Figure 
[5J From this figure, a similar conclusion can be drawn: the estimators perform well in both 
scenarios, with the MSE as small as 10~ 4 -10~ 3 . The best range of 7 is (0.15,0.2). In this range, 
the MSE is relatively insensitive to different choices of 7. 

It is noteworthy that in the first scenario, the support of random variable u is not bounded 
away from the null parameter u$. It is also noteworthy that in the second scenario, a is un- 
bounded such that the assumption 14|) is violated. Despite the seeming challenges in these 
two scenarios, the proposed approach continues to perform well. This suggests that the proposed 
approaches are successful in a broader situations than that considered in Sections [2] and [3] 

Example 3 . The effect of larger n. In this example, we fix (i*o, (Jg, e) = (— 1, 1, 0.05). Since the 
MSE is relatively insensitive to different choices of 7, we fix 7 = 0.2. For the mixing distribution 
H n (-,-), we let u ~ Uniform(l,2) and a ~ Uniform(0.5, 1.5), independently of each other. 
According to the asymptotic analysis in preceding sections, we understand that the performance 
of proposed estimators improves when n increases. In this example, we validate this point by 
choosing n = 10 4 x (1,3,5,8,10). For each n, we run experiments following Steps 1-4 as in 
Example 1. The results are summarized in Table [1] The MSE of all 1*0(7), &o(l) anc l £«(t) 
decreases as n increases. This fits well with the asymptotic analysis in Sections [2] and [3] 
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0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 

gamma gamma 

Figure 2: MSE for 1*0(7) i re d), ^0(7) (green) and e n {l) [blue) for Scenario 1 (left) and Scenario 
2 (right) considered in Example 2. The x-axis displays 7 and the y-axis displays the MSE. 
n = 50,000 and (u ,crg,e) = (-1,1,0.05). 



Example j . The effect of dependence. In this example, we fix n — 50,000, (uo,<TQ,e) = 
(— 1, 1,0.05). We investigate how the dependent structures may affect the performance of the 
proposed procedures. For each L ranging from 1 to 250 with an increments of 10, we generate 
samples as follows. 

1. For each 1 < j < n(l — e), set («j,cr,-) = (1*0,00). 

2. For each n(l — e) + 1 < j < n, draw fij ~ Uniform(l, 2) and aj ~ Uniform(0.5, 1.5). 

3. Draw u>i, ■ ■ ■ , w n+ L independently from N(0, 1). For 1 < j < n, let Zj = X)fe=j +L ■ 
Note that marginally z 3 ■ ~ A^(0, 1). 

4. For 1 < j < n, let Xj = fij + o~j ■ Zj. 

The data generated in this way is block- wise dependent, with the block size being controlled 
by L. Fix 7 = 0.2. We calculate ii (j), OqM, an d £n(l), and repeat the experiment for 100 
times. We then calculate the MSE. The results are summarize in Figure [3] While the MSE 
increase with the block size L, we also note that the MSE remain small when, say, L < 50 (all 
three curves fall below 0.02). This suggests that the proposed methods are relatively robust for 
short-range dependence. 



5 Proofs 



5.1 Proof of Lemma 11.11 and 11.21 

We prove Lemma 1 1.1 1 first. Consider two density functions fk(%) = fk{%] u 

that satisfy lfPj) - ([L3]) . k = 1,2. For short, denote (u k , a%, e k , H k ) = (u [ k} , (crg)W, £ W, ffW), 
Suppose fi = f-2- We want to show that (ui,0i,ei) = (u2,CT2,e2)- Note that the Fourier 
transformation of f± and /2 must be identical. By direct calculations, with Si(t) as defined in 
(HU), 

(l-e 1 )e rfMl -^ ! (l + Sl (i)) = (l-e 2 )e liU2 - 
We first show ai = a 2 - By (pT2T|) . 



■(l + « 2 (t)). 



(5.21) 



, tt(Ml _ M2 ) (l-ei)(l + S i(t)) 



(l-e a )(l + a 2 (t)) 



(5.22) 
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Figure 3: MSE for 1^0(7) (red), 0^(7) (green) and e n (l) (blue) when the data Xj are block-wise 
dependent with a block size L (displayed in the x-axis; see Example 4 for the details). The 
parameters (wo,crg,e) = (—1,1,0.05). For the mixing distribution i7 n (it,a), u ~ Uniform(l, 2) 
and a ~ Uniform(0.5, 1.5) independently. 



Note that |sfc(i)| < £fc/(l ~ £fc) < 1, where |ej,| < e < 1/2. Therefore, the right hand side of 
(I5.22p is bounded away from both and oo by a constant. Letting t tend to oo implies that 

01 = cr 2 . 

Next, we show (ui,ei) = (u 2 ,e 2 ). By (|5.21[) and cti = er 2 , 



(1 - £l )(l + Sl (t)) = (1 - e 2 )e lt ^- tll )(l + s 2 (t)). 



(5.23) 



Fix a small positive number a > 0, let a (i) be the density of iV(0, 1/a). Times a (i) to both 
sides of (|5.23p and integrate in terms of t. By direct calculations and Fubini's theorem, the left 
hand side of (|5.23[) is 

(u - Ul ) 2 



(1-^1 



(~1 



=exp 



2(a 2 - o\ 



and the right hand side of (|5.23[) is 

£2 



(l-e 2 )e ^~ 



=exp(- 



(U - Ml) 2 



2((i 2 - a 2 + a 2 ) 



)dH 1 (u,a), 



)dH 2 (u,<7). 



(5.24) 



(5.25) 



Note that by Dominant Convergence Theorem (DCT), any hxed H (•, •) satisfying Ph(& > ctq) 
1 and P H ((u,a) = (uo,er )) = 1, 



lim 



a ^°J v/a 2 - erg + a 2 



exp( 



(u - it ) 2 



2((i 2 - a 2 + a 2 ) 



)dH(u,a) = 0. 



(5.26) 



Combining (|Q4]) - (|Q5)l gives (1 - ei) = Um a ^ 00 [(l - e 2 )exp(- 



(m 2 -ui) 2 



)], which immediately 
implies (u\,e\) = (it 2 ,e 2 ). This proves Lemma H~T1 

Consider Lemma H~2l The difference is now that both fi satisfy (|1.2p - (|1.4p . Similarly, suppose 
fi = / 2 . We want to show that (u\, <Ji,ei) = (m 2 , cr 2 , e 2 ). By direct calculations, the generalized 
Fourier transform of f k arc (1 - efc ) e "™* t +"^* 2 ^ + rk (t)]. k = 1, 2. It follows that 



l-eiy \i + n(t) 

Let i — > oo on both sides. By the condition of Pn k (u > Uk) = 1, rk(t) — > 0. Comparing the 
modules of both sides gives U\ = w 2 and ei = e 2 . Combining this with (|5.27p . e^ <Tl_ °" 2 ^ = 
i+ri(i) ■ Letting t — > oo, the right hand side tends to 1. Therefore, o\ — cr 2 . □ 
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5.2 Proof of Lemma 12.11 

It is sufficient to show that for any t > and / £ A„(e , A), 

Var(^„(t)) < I e -VW+^t 2 [(1 _ £n) + ene (^- CT g)t 2 ]; (5 28) 

and 

Var(v/„(t)) < -e-^ u '' t+ ^> t2 [(l-e n )(al + 2ul + 44t 2 ) + tn(A + 2ul^ (5.29) 
n 

In fact, once these are proved, the claim follows from (2.12)-(2.14) by taking t = t n (^). 
Consider (|5.28[) . Direct calculations show that 

Var(p n (t)) < -E[\e utx >\ 2 ] < -Ele'^^]. 
n n 

Direct calculations show that 

The claim follows by wo > — A, a 2 < A, and Ph„(u > uo, °o < A) = 1. 
Consider (|5.29[) . Similarly, 

Varfe/_(t)) < -E[\uX ie " tx i\ 2 } < -E[X 2 e -^ tx ']. 
n n J 

By direct calculations, 

E[X]e-^ tx i] =1 + 11, (5.30) 

where 

/ = (1 - e n )[a 2 + (-u Q + V2altf]e-^ot + «lt^ 

and 

II = e n J [a 2 + (-« + V2a 2 t) 2 ]e-^ ut+a2t2 dH n (u, a). 
By Schwartz inequality, 

(-M + V2cr 2 t) 2 < 2{ul + 2<Jot 2 ), 
(-u + V2a 2 t 2 ) 2 = (-u - {u - M ) + V2a 2 t 2 ) 2 < 2{ul + 2a 2 t 2 + (u - m ) 2 ). 

So 

/ < (1 - € n )[a 2 + 2ul + 4a^ 2 ]e-^" ot + CT o t2 , (5.31) 

and 

II < e n [ J (a 2 +2u 2 +AaH 2 )e-^ 2ut+a2t2 dif„(M, ( x)+2 e -^" ot ^(m-mo)^"^"-"") 4 ^ 2 ' 2 ^^, a)]. 

(5.32) 

Note that sup x>0 2x 2 e" /2tx = 2/(et 2 ). It follows 

(m- Mo) 2 e-^"-"°) t+CT2t2 ^„(M,a)] < [2/(ei 2 )] J dH n (u,a). (5.33) 
Inserting (|5.33|) into (|5.32|) and recalling that Ph„(u > m , a 2 < A) = 1, 

II < e n [A + 2u 2 + 4A 2 t 2 + JL] e - v^ot+At 2 ^ ^ 

Inserting (|5.3ip and (|5.34[) into (|5.30|) gives the claim. □ 
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5.3 Proof of Lemma I2T21 

For short, we drop t from the functions whenever there is no confusion. For the first claim, by 
direct calculations, we have: 

uoCff, t) - u (f, t) - 4tt - = 1 + 11 ' 

l/l \g\ 

where I = (1 - j^) • u (g, t), II = fa ■ [Re(g') ■ Re(/ -g) + Im(<?') • Im(/ - g) + Re(/) • Re((/ - 
g)') + Im(/) • Im((/ — g)')]. Now, firstly, using triangle inequality, 

|/|<^^-|I/I 2 H.I 2 |<^^((I/I + | 5 |)I/- 5 |); 

secondly, using Cauchy-Schwartz inequality, |Re(z)Rc(u>) +Im(z)Im(w)| < \z\ ■ \w\ for any com- 
plex numbers z and w, so it follows that 

|//|<^-[|.9'|-|/-.9l + l/|-K/-.9)'l]- 

Combining these gives 

K(3, t) - u (f, t)\ < ± [(\u (g, t)\(\f\ + \g\) + V2\g'\) ■ \f g\ + y/2\g\ ■ \(f g)'\] . 

Consider the second claim. By direct calculations, 

<T 2 (g,t)-a*(f,t)=I + II, 
where / = (1 - ■ u (g, t), II = ffi ■ [(Re{ujgg') - Rc(ujff% Similarly, 

m<^#(i/i + M)i/- 5 i, 

\ii\<^-[\9'\-\f-g\ + \f\-Kf-g)% 

Combining these gives 

WUg, t) - a 2 (/, t)\ < ^ ■ [(Wl(g, t)\-t- (|/| + | 5 |) + V2\g'\) ■ \f g\ + V2\f\ ■ |(/ - g)'\] . 

a 

5.4 Proof of Lemma 12.31 

Write t n = t n {^f) for short. Introduce the event 

An = {max{|^„(t„) - tp(t n )\, y n {t n ) - tp'(t n )\} < log 3/2 (n)}. 

Applying Lemma [2.11 P(A c j l ) — > 0, uniformly for all / G A n (eo,A). To show the claim, it is 
sufficient to show that the inequalities hold over the event A n . Since the proofs are similar, we 
only prove the first one. 

We claim that (a). l/<p(t n )\ < o(l) over event A n , (b). |y> n (i n )| ~ |<p(in)| over event A n , 
and (c). \uo((p;t n )\ < o(l). Consider (a) and (b). By e„ < eo < 1/2 and elementary calculus, 
| r-(t) | < e„/(l — e„) < eo/(l — eo)- The claim follows from 

Ht)\ > (1 - \r(t)\)\Mt)\ > l —^e-^'^. 

i — e 
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By the definition of A n , 

bn(*n) - <p(tn)\ < O^" 1 )/ 2 ), 

and the claims follow. Consider (c). By Lemma 12^31 |u (<p„;i)| < |uo| + l r '(*n)|- Write 

r >(t) = / [ w („ - Uq ) + - a 2^ e u(u-no)t+i^-a l)t*f* dHn ^ (7 ). 

Since that sup a , >0 {a;e _ ' T } = 1/e and Ph„(u > uo,c 2 < A) = 1, the claim follows from 

|r'(t n )| < J [\u-uo\e-^- Uo ^^+\a 2 -a 2 \t n ]dH n (u,a) < (V2/(et n )) + At n . 

Finally combine (a)-(c) with Lemma 

\u Q (ip n ;t n ) - u Q (ip;t n )\ < 5(1) ■ [\(p n (t n ) ~ <p(t n )\ + Wn^n) ~ </(t n )\], 
and the claim follows. □ 

5.5 Proof of Lemma 12.41 

For simplicity, drop t from tp(t), (fo(t), and r(t) whenever there is no confusion. Consider the 
first claim. Recalling that \tp\ = \cpo\ X |1 + r\, 

ll^)l = (^ol)-H + r| + M-||l + r|. 
Using the definition of uo(<p; t) and Lemma 11.31 it follows from direct calculations that 

M^)-.o| = ir ^i|l + r(t)|. (5.35) 



Moreover, 



d h , M] r'(t)(l+f(t)) + (l + r(t))f(t) 

Jt ll + r{t)l = 2\TTW)\ ■ ( } 



By that Pfi n (u > u ) = 1, 

\ r (t)\ < -^—\ ( ' e -(u-u wV2+ l{ -(u-u wV2+(° 2 ~^)t 2 /2 d H n ( u ,a)\ < (5.37) 

1 J 1 

Combining (j5.35p - (j5.37[) gives the claim. 
Consider the second claim. Write 

<fi' = foi 1 + r ) + for'- 
We have (pip' = \1 + r\ 2 (po(p' + | v^o 1 2 ( 1 + r)r' , and so 

Re(u(pip) = \1 + r| 2 Rc(w^oVo) + |<A)| 2 R-c(w(l + f)r'). 

Therefore, 

I 2/1 ,\ 2, , |Re(cj(l + f(t))r') | , 
FoW*) ~ CT ol < t|i + r (f)|2 - C ( £ o)l r (*)!/*> 

and the claim follows directly. □ 



17 



5.6 Proof of Theorem 13.11 

Write for short t n = t n (j) and £„(•;£„) = e n (-; t n , Uq, erg). By triangle inequality, 

\e n (ip n ;t„) - e n \ < \e n (cp n ; t n ) - e n (ip; t n )\ + \e n (ip; t„) - e n \. 

Compare this with the desired claim. It is sufficient to show that E[\e n (ip n ; t n ) — eri(v?> £n)| 2 ] < 
n' 47-1 and \e n ((p; t n )/e n — 1| tends to in a speed that does not depends on e n and A n . 
Consider the first term first. By the definition of the functional e n (-;t n ) (i.e. (|3.19p ). 

\e n (cp n ;t n )-e n {cp;t n )\ < |e-<" uot »- iCT °*»/ 2 (^ n (t n ) - <p{t n ))\ < e^\cp n (t n ) - cp(t n )\. 
At the same time, by the definitions of <p n (') and </?(•) and elementary calculus, 

E\<p n (t n ) - <p(t n )\ 2 = Ivar(e^" Xl ) < -E[e~^ x -]. 

n n 

Combining these gives, 

E(\e n (<p n ;t n ) - e n {<p-t n )\ 2 ) < l e ^u E[e -V2t nXl]> 

n 

where by direct calculations and the assumptions of Ph„{u > uo, a 2 < A) = 1 and a 2 < A, the 
last term is no greater than 

- ((1 - e„)e*» CT o +e „ f e^^-^^ 2 dH n (u, a)} < n^" 1 . 



Combining these gives the first claim. 

Consider the second claim. Recall that e n = e n (</?o; tn), that (p(t) = ipo(t)(l + r(t)) (see (|1.9[0 
and that ip (t) = (1 - £n ) e -"«ot„-i<xX/ 2 . By the definition of the functional e n (-,t„), 

e n {ip;t n ) - e n = e^ 110 '"-^'-/ 2 ^) - ip (t n )) = (1 - e n )r(t n ). 
It then follows from the definition of r(-) that 



\e-n{f]t n ) - e n \ < |(1 - e n )r(t n )\ = e„ 



(5.38) 



Suppose condition (b) holds. Then Ph u (u > uq + S„) = 1, where S n satisfies (|2.17p with some 
constant Co > 0. It follows from (|5.38[) and elementary calculus that as n — > oo, 

\e n {<p;t n )/e n - 1| < / e-^-^dHni^a) -> 0. (5.39) 



Suppose condition (c) holds. Let 6 = u — u , k = a 2 — a 2 . By the definitions of g(K.\S) and 
g(d) and elementary Fourier analysis, 

By the assumptions, P Hn ( s > °) = 1 and 9 FT (t) < c ( l + 1*1)"") so 

e «6t ng FT {t 2j 2 . 5)h{5)d6 \ < J e -ty \ g FT {t 2j 2 . S) \ h ( S)d5 < C{1 + #J 2) -k _^ Qj 

Combining these with (|5.38p gives the claim. □ 
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5.7 Proof of Theorem EO 

Write for short t n = ^(7), Uo = "o(^„;t„), and <Tq = a^{ip n ; t n ). By the definitions of 
e„(-;i, u,ct), 

| £n (^;i„,wo,<T 2 ) - en(Vn;tn,u ,a 2 )\ < | e -^(«o-«o)t»-i(*o 2 — o 2 )4 _ i| . | e -" Mo *"-^ 2 *^„(i„)|, 

where we note that by the definition of the functional £„(•; t, u, cr), the last term < l+|e„(<y9„; i n , uo, 
By Lemmas 12. 3H2. 41 except for a small probability that tends to as n tends to 00, 

|fio - «o|*» < t n \r'{t n )\ + d{n^^l 2 ), \a 2 Q - a 2 \t 2 n < t n \r'{t n )\ + d{n A ^' 2 ). 

At the same time, by Theorem 13.11 except for a small probability that tends to as n tends to 

00, 

Combine these, as n tends to 00, except for a small probability that tends to 0. 

\tn(<Pn;tn,Uo,a%) - e„(< y 9„; t n , w , Oq)| < t n \r'(t n )\, 
which, by Lemmas I2.5H2.6[ tends to 0. This concludes the proof. □ 
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